Scraping is either enabled globally or explicitly.
For explicit scraping add the following annotation for Pods or services.
---
apiVersion: v1
kind: Service
metadata:
name: my-service
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics" # optional
prometheus.io/port: "9102"
Note that for Daemonsets you have to put the annotation in the template spec:
---
apiVersion: apps/v1beta2
kind: DaemonSet
spec:
[...]
template:
metadata:
[...]
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
From https://prometheus.io/docs/prometheus/latest/querying/examples/
Simple metric and maching by label
http_requests_total
http_requests_total{job="apiserver", handler="/api/comments"}
Range query for 5min
http_requests_total{job="apiserver", handler="/api/comments"}[5m]
Pattern matching with regex
http_requests_total{job=~".*server"}
http_requests_total{status!~"4.."}
Note that when using regex pattern you have to think of them being enclosed with ^ and $ markers.
So if you want to match abc in the middle of a label value you need to specify .*abc.*!
Aggregate and group by
sum(rate(http_requests_total[5m])) by (job)
sum by(a,b)(mymetric{field="value")
Math expressions
(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024
Top keys
topk(3, sum(rate(instance_cpu_time_ns[5m])) by (app, proc))
Count and group
count(instance_cpu_time_ns) by (app)
Default a count returning "no value" to 0
count(some_metric) or vector(0)
Find absent metrics (e.g. for alerting on missing metrics)
absent(some_metric{label1="value1", label2="value2"})
absent_over_time(some_metric{label1="value1", label2="value2"}[15m])
If you cannot specify an absent expression because you cannot specify the dynamic
labels you can follow this approach
using offset and unless:
max_over_time( your_metric[4h] ) offset 1h
unless your_metric
On the fly series label replacing: create namespace2 from .* match on label namespace:
label_replace(http_requests_total, "namespace2", "$1", "namespace", "(.*)")
A good cheat sheet is https://promlabs.com/promql-cheat-sheet/
Prometheus is very strict when a single syntax error occurs on a metric endpoint and then
ignores the entire endpoint. To check which line causes the error run promtool check metrics.
promtool is included in your prometheus pod. It also is a static binary you can just
copy anywhere else to use:
curl -s localhost:8080/metrics | promtool check metrics
See https://last9.io/blog/mastering-prometheus-relabeling-a-comprehensive-guide/
If you create new recording rules it is possible to compute them for older data too: https://github.com/prometheus/prometheus/blob/main/docs/storage.md#backfilling-for-recording-rules
- GrafanaCloud
- Influx
- Blackbox Exporter (allows defining TCP, HTTP... checks for non-k8s endpoints)
- Adding many Blackbox Exports
- Running multi-target exporters
- Mixins for different tools
- Grafana k8s Cost Report